Inducing Synchronous Grammars with Slice Sampling
نویسندگان
چکیده
This paper describes an efficient sampler for synchronous grammar induction under a nonparametric Bayesian prior. Inspired by ideas from slice sampling, our sampler is able to draw samples from the posterior distributions of models for which the standard dynamic programing based sampler proves intractable on non-trivial corpora. We compare our sampler to a previously proposed Gibbs sampler and demonstrate strong improvements in terms of both training log-likelihood and performance on an end-to-end translation evaluation.
منابع مشابه
A Gibbs Sampler for Phrasal Synchronous Grammar Induction
We present a phrasal synchronous grammar model of translational equivalence. Unlike previous approaches, we do not resort to heuristics or constraints from a word-alignment model, but instead directly induce a synchronous grammar from parallel sentence-aligned corpora. We use a hierarchical Bayesian prior to bias towards compact grammars with small translation units. Inference is performed usin...
متن کاملSynchronous Linear Context-Free Rewriting Systems for Machine Translation
We propose synchronous linear context-free rewriting systems as an extension to synchronous context-free grammars in which synchronized non-terminals span k ≥ 1 continuous blocks on each side of the bitext. Such discontinuous constituents are required for inducing certain alignment configurations that occur relatively frequently in manually annotated parallel corpora and that cannot be generate...
متن کاملMachine Translation Using Probabilistic Synchronous Dependency Insertion Grammars
Syntax-based statistical machine translation (MT) aims at applying statistical models to structured data. In this paper, we present a syntax-based statistical machine translation system based on a probabilistic synchronous dependency insertion grammar. Synchronous dependency insertion grammars are a version of synchronous grammars defined on dependency trees. We first introduce our approach to ...
متن کاملSynchronous Constituent Context Model for Inducing Bilingual Synchronous Structures
Traditional Statistical Machine Translation (SMT) systems heuristically extract synchronous structures from word alignments, while synchronous grammar induction provides better solutions that can discard heuristic method and directly obtain statistically sound bilingual synchronous structures. This paper proposes Synchronous Constituent Context Model (SCCM) for synchronous grammar induction. Th...
متن کاملA Quasi-polynomial-time Algorithm for Sampling Words from a Context-free Language 1 Problem Speciication and History
A quasi-polynomial-time algorithm is presented for sampling almost uniformly at random from the n-slice of the language L(G) generated by an arbitrary context-free grammar G. (The n-slice of a language L over an alphabet is the subset L\ n of words of length exactly n.) The time complexity of the algorithm is " ?2 (n jGj) O(log n) , where the parameter " bounds the variation of the output distr...
متن کامل